Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Data Valuation in machine learning is concerned with quantifying the relative contribution of a training example to a model’s performance. Quantifying the importance of training examples is useful for identifying high and low quality data to curate training datasets and for address data quality issues. Shapley values have gained traction in machine learning for curating training data and identifying data quality issues. While computing the Shapley values of training examples is computationally prohibitive, approximation methods have been used successfully for classification models in computer vision tasks. We investigate data valuation for Automatic Speech Recognition models which perform a structured prediction task and propose a method for estimating Shapley values for these models. We show that a proxy model can be learned for the acoustic model component of an end-to-end ASR and used to estimate Shapley values for acoustic frames. We present a method for using the proxy acoustic model to estimate Shapley values for variable length utterances and demonstrate that the Shapley values provide a signal of example quality.more » « less
-
We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and not to important regions. Importance is predicted for each utterance by a data augmentation agent that is trained to maximize the amount of noise it adds while minimizing its impact on recognition performance. The effectiveness of our method is illustrated on version two of the Google Speech Commands (GSC) dataset. On the standard GSC test set, it achieves a 23.3% relative error rate reduction compared to conventional noise augmentation which applies noise to speech without regard to where it might be most effective. It also provides a 25.4% error rate reduction compared to a baseline without data augmentation. Additionally, the proposed ImportantAug outperforms the conventional noise augmentation and the baseline on two test sets with additional noise added.more » « less
-
We are sharing the Ecoacoustic Dataset from Arctic North Slope Alaska (EDANSA-2019), a dataset with audio samples collected from the area of 9000 square miles throughout the 2019 summer season on the North Slope of Alaska and neighboring regions.</p> There are over 27 hours of labeled data according to 28 tags with enough instances of 9 important environmental classes to train baseline convolutional recognizers.</p> Please see the following GitHub page for the accompanying publication, updates about the dataset, and baseline code: https://github.com/speechLabBcCuny/EDANSA-2019 </p>more » « less
-
The arctic is warming at three times the rate of the global average, affecting the habitat and lifecycles of migratory species that reproduce there, like birds and caribou. Ecoacoustic monitoring can help efficiently track changes in animal phenology and behavior over large areas so that the impacts of climate change on these species can be better understood and potentially mitigated. We introduce here the Ecoacoustic Dataset from Arctic North Slope Alaska (EDANSA-2019), a dataset collected by a network of 100 autonomous recording units covering an area of 9000 square miles over the course of the 2019 summer season on the North Slope of Alaska and neighboring regions. We labeled over 27 hours of this dataset according to 28 tags with enough instances of 9 important environmental classes to train baseline convolutional recognizers. We are releasing this dataset and the corresponding baseline to the community to accelerate the recognition of these sounds and facilitate automated analyses of large-scale ecoacoustic databases.more » « less
-
Arctic boreal forests are warming at a rate 2–3 times faster than the global average. It is important to understand the effects of this warming on the activities of animals that migrate to these environments annually to reproduce. Acoustic sensors can monitor a wide area relatively cheaply, producing large amounts of data that need to be automatically analyzed. In such scenarios, only a small proportion of the recorded data can be labeled by hand, thus we explore two methods for utilizing labels more efficiently: self-supervised learning using wav2vec 2.0 and data valuation using k-nearest neighbors approximations to compute Shapley values. We confirm that data augmentation and global temporal pooling improve performance by more than 30%, demonstrate for the first time the utility of Shapley data valuation for audio classification, and find that our wav2vec 2.0 model trained from scratch does not improve performance.more » « less
-
null (Ed.)Human listeners use specific cues to recognize speech and recent experiments have shown that certain time-frequency regions of individual utterances are more important to their correct identification than others. A model that could identify such cues or regions from clean speech would facilitate speech recognition and speech enhancement by focusing on those important regions. Thus, in this paper we present a model that can predict the regions of individual utterances that are important to an automatic speech recognition (ASR) “listener” by learning to add as much noise as possible to these utterances while still permitting the ASR to correctly identify them. This work utilizes a continuous speech recognizer to recognize multi-word utterances and builds upon our previous work that performed the same process for an isolated word recognizer. Our experimental results indicate that our model can apply noise to obscure 90.5% of the spectrogram while leaving recognition performance nearly unchanged.more » « less
An official website of the United States government

Full Text Available